{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Further Hypothesis Testing"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Select this cell and type Ctrl-Enter to execute the code below.\n",
    "\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "from scipy import stats\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "data = pd.read_csv(\"stars.csv\")\n",
    "type_key = ['Brown Dwarf', 'Red Dwarf', 'White Dwarf', 'Main Sequence', 'Supergiant','Hypergiant']\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4 - Comparing means of more than two groups"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "During the course of your investigations for Dr Howe, you have noticed that the distributions of the dwarf stars' luminosities (types 0,1 and 2) are also overlapping."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "types = [0,1,2]\n",
    "\n",
    "sample = data.luminosity.apply(np.log)\n",
    "grouped = sample.groupby(data.type)\n",
    "\n",
    "xlab = 'log(luminosity)'\n",
    "ylab = 'freq'\n",
    "\n",
    "displayed = pd.concat([grouped.get_group(t) for t in types])\n",
    "bins = np.linspace(displayed.min(), displayed.max(), 20)\n",
    "ax = plt.axes()\n",
    "ax.set_xlabel(xlab)\n",
    "ax.set_ylabel(ylab)\n",
    "\n",
    "for t in types:\n",
    "    plt.hist(grouped.get_group(t), bins, alpha=0.5, label=type_key[t], color='C' + str(t))\n",
    "\n",
    "ax.legend(loc='upper right')\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Question: do types 0, 1 and 2 have the same mean luminosity?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The t-test can only compare the means of two samples (or one sample with a theoretical mean). \n",
    "\n",
    "To move beyond two samples, we need to use a different method called [*analysis of variance*](https://en.wikipedia.org/wiki/Analysis_of_variance) (ANOVA)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### One-way ANOVA"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Theory"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$H_0$: All of the groups have identical means:  $\\mu = \\mu_1 = \\mu_2 = \\mu_3$.\n",
    "\n",
    "$H_1$: Not all of the group means are identical.\n",
    "\n",
    "The one-way (also known as single-factor) ANOVA uses the F-test to compare the within-group and between-group variation:\n",
    "\n",
    "$$F = \\frac{\\text{between-group variation}}{\\text{within-group variation}}$$\n",
    "\n",
    "Under $H_0$, $F$ follows an F-distribution with parameters $(g-1,n_T-g)$, where $g$ is the number of groups (here, 3 types of star), and $n_T$ is the total number of observations.\n",
    "\n",
    "Once again, the F-distribution provides a p-value associated with the calculated value of $F$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Assumptions\n",
    "\n",
    "- Observations are independent.\n",
    "- Populations are normally distributed.\n",
    "- Variances of the populations are equal.\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Application\n",
    "\n",
    "We will set $\\alpha=0.05$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "stats.f_oneway(grouped.get_group(0),grouped.get_group(1),grouped.get_group(2))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here we have $p<\\alpha$, so we reject $H_0$: the three groups do not appear to have the same mean luminosity."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Other types of ANOVA"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "ANOVA is an important element of statistical analysis when we are interested in comparing the effects of different treatments. \n",
    "\n",
    "The underlying statistical model changes, depending on the expected relationship between treatment and effect (*fixed-*, *random-* or *mixed-effects*).\n",
    "\n",
    "Where multiple variables change simultaneously (for example, in patient populations), we may need to consider *multiple factors* (e.g. *two-way ANOVA*) and the *interactions* between factors.\n",
    "\n",
    "<br>"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
